Sharing Data and Work Across Concurrent Analytical Queries
نویسندگان
چکیده
Today’s data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the querycentric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators. First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results.
منابع مشابه
Sharing data and work across queries in analytical workloads
Traditionally, query execution engines in relational databases have followed a query-centric model: They optimize and execute each incoming query using a separate execution plan, independent of other concurrent queries. For workloads with low contention for resources, or workloads with short-lived queries, this model makes the optimization phase faster and creates efficient execution plans. For...
متن کاملRequest Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by Utilizing Data Sharing Across Concurrent Distributed Queries
This paper focuses on the problem of improving distributed query throughput of the RDBMS-based data integration system that has to inherit the query execution model of the underlying RDBMS: execute each query independently and utilize a global buffer pool mechanism to provide disk page sharing across concurrent query execution processes. However, this model is not suitable for processing concur...
متن کاملSimultaneous Query Pipelines in QPipe
Data warehousing and scientific database applications operate on massive datasets and are characterized by complex queries accessing large portions of the database. Concurrent queries often exhibit high data and computation overlap, e.g., they access the same relations on disk, compute similar aggregates, or share intermediate results. Unfortunately, run-time sharing in modern database engines ...
متن کاملMQJoin: Efficient Shared Execution of Main-Memory Joins
Database architectures typically process queries one-at-a-time, executing concurrent queries in independent execution contexts. Often, such a design leads to unpredictable performance and poor scalability. One approach to circumvent the problem is to take advantage of sharing opportunities across concurrently running queries. In this paper we propose Many-Query Join (MQJoin), a novel method for...
متن کاملA Parallel Processing Strategy for Evaluating Recursive Queries
The set of resolvents generated by a recursive intension in a lirst-order database is treated as a set of concurrent database queries. A strategy for egiciently ev,aluating these concurrent queries in a multi-processor environment is presented. The strategy combines three query processing techniques, namely, query decomposition, intermediate result sharing and data-flow and pipelined query exec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 6 شماره
صفحات -
تاریخ انتشار 2013